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1.  INTRODUCTION 


Some  of  Che  problems  of  signal  detection  considered  by  electrical 
engineers  can  be  described  in  terms  of  a  linear  model 

Y  =  £  +  n  (1.1) 


where  £  is  an  n-vector  signal,  n  is  an  n-vector  noise  variable  perturb¬ 
ing  the  signal  during  transmission  and  Y  is  the  received  message.  The 
general  problem  associated  with  the  model  (1.1)  is  that  of  detecting 
the  presence  of  a  signal  in  a  received  message  and  estimating  it  when 
present.  Different  types  of  problems  arise  depending  on  the  nature  of 
£  and  n . 

We  consider  two  types  of  problems ,  one  where  £  is  considered  as  a 
specified  vector  or  a  specified  function  of  unknown  parameters,  and 
another  where  £  is  considered  as  a  stochastic  vector  distributed  in¬ 
dependently  of  the  noise  vector  n. 

The  following  notations  are  used.  A'  denotes  the  transpose  of  a 
matrix  A  when  its  elements  are  real  and  A*  the  conjugate  transpose  of 
A  when  its  elements  are  complex. 


(D  X  -  Np(p,E),  i.e.,  a  real  p-vector  X  has  a  p-variate  real  normal 
distribution  with  the  probability  density  function  (p.d.f.) 


(2it)  p/2|l|  ^exf)  t-Jj(x-g)  ,E~1(x-u)  ]  . 


(1.3) 


(11)  X  -  N^(u,E),  i.e.,  a  complex  vector  X  has  a  p-variate  complex  nor¬ 
mal  distribution  with  the  p.d.f. 


(rr)'p|ir1ex[J  t-(x-u)  *E“1(x-w)  1  . 


(1. 


Hq  :  £  =  0  versus  :  £  =  6  (specified) 


(2,1,2) 


Rejection  of  at  a  chosen  level  of  significance  would  indicate  that  the 
received  message  X  contains  the  signal  6  and  is  not  pure  noise. 

It  may  be  noted  that  when  p  =  1,  the  appropriate  test  of  is  the  one¬ 
sided  t  test 

a1*  X 

t  =  — ± — £_  »  c  if  o  >  0  (or  <  c  if  6<0)  (2.1.3) 

(s/f)4 

on  f  degrees  of  freedom.  When  p  >  1,  one  generally  uses  Hotelling's 


T2  =  x'S-1X 

P 


(2.1.4) 


which  is  distributed  as  F  on  p  and  f-p+1  degrees  of  freedom.  The  test  (2.1.4), 
however,  does  not  involve  the  specified  6.  A  more  powerful  test  than  (2.1.4) 
is  recently  suggested  by  Khatri  and  Rao  (1985b)  based  on  the  following  con¬ 
siderations. 

Let  C  be  a  px(p-l)  matrix  of  rank  p-1  such  that  6'C  =  0  and  consider 
the  transformation 

5'\  r\ 

lx  ,  V  =  (  Is  (5:C) .  (2.1.5) 

Then 


(2.1.6) 
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and  the  problem  (2.1.2)  reduces  to  testing 


Hq  :  E(Yl)  -  0  versus  H  :  E^)  -  6*6  >  0 
given  E(Y2)  =  0. 


(2.1.7) 


Such  a  problem  involving  a  conditional  test  was  considered  in  Rao  (1946). 
The  appropriate  test  is  t  >  t  where 


t 


a*'i(f-p-H)}s6,S~1X _ 

( (1+aX'  S_1X) 5 1 S-16-a(6  *S-1X)2 


(2.1.8) 


has  t  distribution  on  (f — pH-1)  degrees  of  freedom  and  t  is  the  upper  tail 
aX  point  of  the  t  distribution. 

In  testing  (2.1.7),  we  used  the  condition  E(Y^)  -  0.  But  in  practice 
it  may  be  necessary  to  check,  whether  thi3  holds.  For  this,  we  use  Hotelling's 


a(f-p+2) 

P-1 


(X'S_1X- 


(d'S^X)2 

6*S'"16 


(2.1.9) 


which  has  F  distribution  on  (p-1)  and  (f-p+2)  degrees  of  freedom. 

The  tests  (2.1.8)  and  (2.1.9)  can  be  extended  to  the  case  where  n 
and  S  have  complex  normal  and  Wishart  distributions  respectively.  For 
example,  the  test  for  Hq!p=0  versus  H ^ :  ii=  6  is  t  >_  tQ  where 


real  part  of 


[  2  (f-p-H)  ]^5*S-IX 


[(a“1+X*S~1X)6*S_i6-6*S"1XX*s“l6]ii 


(1.2.10) 


has  t  distribution  on  2(f— p+1)  degrees  of  freedom, 
that  E(Y2)  -  0  is  tested  by  Hotelling's 


Further  the  hypothesis 


t2  =  a(f-p2)  (x*s-lx  _  )  (2.1.11) 


6*S  6 


which  has  F  distribution  on  2(p-l)  and  2(f-p+2)  degrees  of  freedom. 


2.2  Discrimination  between  noise  and  a  specified  signal 

In  Section  2.1,  we  considered  the  problem  of  testing  for  pure  noise 
against  a  specified  signal  on  the  basis  of  an  observed  message  X  and  an  in¬ 
dependent  estimate  f  *S  of  Z.  The  interpretation  of  such  a  test  at  a  chosen 
level  of  significance  is  not  simple,  specially  when  the  same  estimate  f 
of  Z  is  used  repeatedly  to  test  for  signals  in  a  number  of  incoming  future 
messages.  In  order  to  provide  a  satisfactory  solution,  we  consider  the  prob¬ 
lem  of  signal  detection  as  one  of  discrimination  between  alternative  populations 

N  (0,Z),  (or  N  (0,Z)),  and  N  (5,1),  (or  N  (<5,Z)),  when  Z  is  unknown  but  an 
P  P  P  P 

estimate  f  XS  of  E  is  available.  In  such  a  case,  the  estimated  linear 


discriminant  function  (IDF)  is  proportional  to 

y  =  6'S-1X,  (or  5*S_1X). 


(2.2.1) 


tin  the  sequel  we  follow  the  practice  of  giving  the  results  for  the  real  case 
first  and  then  for  the  complex  case  within  brackets  as  in  (2.2.1)].  The 
discriminatory  power  of  (2.2.1)  when  applied  to  future  observations  is  a 
monotone  function  of  the  discrimination  index  (DI) 


i(S,Z)  = 


[E  (y  |  S,  C=5)  -  E  (y  1  S ,  C=0) 
V (y | S) 


6 ' S-1ZS_1 6 


,  (or  o  (S,  Z)  = 


5*S_1<S 

=V-—  ii  mmm,  i  —  — —  - 

6*S-1ZS-16 


(2.2.2) 
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If  E  is  known,  the  true  LDF  is  6’E-1X,  (or  6*E-1X),  and  the  optimal  DI  is 

p(E,E)  =  8  '  E-16 ,  (or  p  (IT,  E )  =  6  *  1 5 )  (  2.2.3) 

which  is  greater  than  or  equal  to  (2.2.2)  by  the  Cauchy-Schwartz  inequality, 
so  that  there  is,  in  general,  loss  of  information  in  using  an  estimated  E. 

The  expression  (2,2 ,2)  involves  the  unknown  covariance  matrix  E,  and  there¬ 
fore,  the  realized  discriminatory  power  by  using  a  particular  estimate  of 
E  in  classifying  future  observations  remains  unknown.  We  raise  the  question 
as  to  whether  an  estimate  of  (2.2.2)  can  be  obtained  in  terms  of  known  (observed) 
values  to  provide  some  idea  of  the  performance  of  the  estimated  LDF.  One 
such  is  the  plug-in  estimate  p(S,f_1S),  for  p(S,f_1S)),  but  it  is  known  to  be 
highly  biased  estimator  of  (2.2.2).  In  a  recent  paper,  Khatri  and  Rao  (1985a) 
provided  a  satisfactory  solution,  which  is  as  follows. 

It  is  shown  that  in  the  real  case 


(S’S-^)2 _ 

(6'E-16) (6'S-1ES-1d) 


and  G 


6 '  S_1<5 


(2.2.4) 


are  independently  distributed  with  p.d.f.  (probability  density  function) 
of  B  as 


_ Hi-2 _  b«-p)/2a_b)(p-3>/2 

r(i=E±S.  ,  r(fci  > 


and  that  of  G  as 


(f-p+1) /2 
g 


(2.2.6) 


In  the  complex  case,  defining  B  and  G  with  6'  replaced  by  <5*  in  (2.2.4), 
it  is  shown  that  B  and  G  are  independently  distributed  with  the  p.d.f.  of 
B  as 


r (f+i) 

r(f-p+2)  r(p-i) 


b(f-p+l) (1_b)P-2 


(2.2.7) 


and  that  of  G  as 


1 


r(f-p+i) 


e-s  -(f-P) 


(2.2.8) 


The  distribution  (2.2.7)  was  earlier  obtained  by  Reed,  Mallet  and  Brennan 
(1974).  The  distributions  (2.2.3)  -  (2.2.8)  are  independent  of  the  unknown 

parameters  which  enables  us  to  draw  inferences  on  p(9,E),  (or  p(S,E)),  through 

• •  «* 

the  pivotal  quan-tities  B  and  G,  (or  B  and  G)  as  discussed  below. 

r;  Using  the  expressions  for  the  moments  of  the  beta  distribution 
(Rao  (1973)  ,  p.  168) 


E(B  or  B)  = 

we  have 

E [p  (S ,  E)  )  ,  (or  p (S , E) )  ]  =  [5'E_16,(or  6*E_16)  ] .  (2.2.9) 

If  the  average  efficiency  is  to  be  maintained  at  about  half  the  optimal 
efficiency,  then  from  (2.2.9)  we  have 


f-p+2  „  J. 


f+1 


or  f  =  2p 


(2.2.10) 


for  both  real  and  complex  cases,  i.e.,  the  degrees  of  freedom  on  which  E 


is  estimated  should  be  at  least  twice  the  number  of  components  of  the  signal. 
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This  result  for  the  complex  case  is  mentioned  in  Reed,  Mallet  and  Ere:.:..;: 
(1974).  Similarly  we  can  equate  the  ratio  in  (2.2.9)  to  any  desired  rat; 
other  than  (1/2)  and  find  the  degrees  of  freedom  f  needed  for  the  esc  in.,.  .  : 
of  Z. 

2.  Perhaps,  a  more  satisfactory  way  of  using  the  distributions  ''2.2.- 

and  (2.2.7)  is  as  follows.  Let  b  ,  (or  b  )  ,  be  the  lower  a.7.  point  of  the 

a  a 

distribution  (2.2.4),  (or  (2.2.7)).  Then  we  can  make  the  confidence  state¬ 
ment  that 

p(S,Z)  >  b  6 '  Z-16 ,  (or  p(S,E)  >  b  5*5f15-'  (2.2.11; 

—  a  —  a 

with  a  confidence  coefficient  of  (l-a)%. 

3.  The  results  (2.2.10)  and  (2.2.11)  still  involve  the  unknown  quantity 
Z.  We  raise  the  question,  whether  the  actual  magnitude  of  p(S,Z)  for  given 

S  can  be  assessed  through  known  values.  Using  the  joint  distribution  of 

B  and  G  as  in  (2.2.5)  and  (2.2.6)  it  is  shown  in  Khatri  and  Rao  (1985a)  that 
2  2 

E[p(S,I)-gD  ]  attains  its  minimum  at  g  =  (f-p+2) (f-p-1) /f (f+1)  so  that 


;(s.z)  -  ^p-+-2)  (.f-p-3-i  d2 

f (f+1) 


(2.2.12) 


which  does  not  involve  unknown  parameters,  in  a  close  approximation  to 
p(S,I).  Similarly 


o(S,Z) 


( f-p+2) (f-p-1)  d2 

f (f+1) 


(2.2.13) 


is  a  close  approximation  to  p(S,E). 

4.  We  can  also  obtain  the  exact  lower  confidence  bound  to  o(S,E),  which 
provides  a  satisfactory  answer  to  the  problem  raised.  We  define  the  random 


variab le 


Z  1  BG  f  (or  z  =  ) 

1  D  D 


(  2.2.14) 


which  has  the  confluent  hypergeometric  distribution  with  the  p.d.f. 


e~z  zm~l  f(a+b) 
r(m)  T(a) 


'V  (b,m-a+l ;  z) 


(2.2.15) 


where  m  =  (f-p+l)/2,  a  =  (f-p+2)/2,  b  =  (p-l)/2,  (or  m  =  f-p+1,  a  =  f-p+2. 


b  =  p-1) ,  and 


'l'  (b,c;z)  = 


t*3  *  (l+t)c  b  1  exp(-zt)dt. 


(2.2.16) 


The  percentage  points  of  this  distribution  at  various  levels  are  tabulated 
in  Khatri,  Rao  and  Sun  (1986).  If  Za(or  z^)  is  the  lower  o%  point  of  this 
distribution,  then 


0(S,E)  D2  (or  p(S,E)  D2) 


(2.2.17) 


provides  the  lower  confidence  bound  to  p(S,E)»  (or  p(S,E)),with  a  confidence 
coefficient  of  (l-a)%. 

Several  approximations  to  the  distribution  (2.2.15)  are  also  obtained 
a  Khatri,  Rao  and  Sun  (1986)  when  f  is  large  compared  to  p,  from  which 
fairly  accurate  percentage  points  can  be  easily  obtained. 


3.  PROBLEMS  INVOLVING  A  RANDOM  SIGNAL 
Following  the  recent  papers  by  Wax  and  Kailath  (1984)  and  Bai,  Krishnaiah 
and  Zhao  (1986) ,  Let  us  consider  the  model 


X ( t)  =  A  s(t)  +  n ( t) ,  t  =  t  . t  , 

1  n 


(3.1) 


5ft 

«v»> 
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where  X(t)  is  a  p-vector  message  received  at  time  t,  s(t)  is  a  q-vector  random 
signal,  n(t)  is  a  p-vector  noise  component  and  A  is  an  unknown  p*q  matrix 
with  elements  independent  of  t.  The  following  assumptions  are  made  regard¬ 
ing  the  model  (3.1). 

(i)  s(t^),  i  ■  l,...,n,  are  i.i.d.  with  the  common  distribution 

Nq(0,'O,  (or  N  (0,?)),  q  <  p. 

(ii)  n(t^) ,  i  »  l,...,n,  are  i.i.d.  with  the  common  distribution 
Np(0,a2I1),  (or  Np  (0,  o2^ ) ) . 

(iii)  s(t^)  and  n(t^)  are  independent  for  all  i  and  j. 

Under  these  assumptions 

X(t)  -  n  (o,r,  -  r+o2!:,) ,  (or  n  (o,e,  =  r+o2i,)) 

p^  i  p  z  1 

where 

T  *  A'fA’ ,  (or  A'VA*)  (3.2) 


is  of  rank  q  <  p.  Further,  if 


n  n 

5,-1  X(t.)X(t.) ’ ,  (or  l  X(t  )X(t .)*) 
z  1  1  1 


(3.3) 


then 


S2  -  W  (n,E2),  (or  S2  -  Wp(n,E2)). 


(3.4) 


We  consider  the  problem  of  testing 


ft./ 

K-‘ 

K< 

fry 

M 


% 


Hq  :  Rank  T  «  q<p  versus  rank  f  is  arbitrary  (3.5) 


and  also  of  estimating  q,  the  rank  of  f, 


3.1  Case  1:1-1 


In  this  case  =  F+o  anc*  it  is  well-known  (see  Anderson  (1963))  that 
the  likelihood  ratio  criterion  for  testing  (3.5)  is 


=  ^q-t-r^V _ 

Jq  [(vl+,,,+v/(p"4)]s(p~q) 


(3.1.1) 


where  •£,>...>£  are  the  eigen  values  of  n  S„  and  s  =  n/2  in  the  real  case 

1—  —  p  i. 

and  s  =  n  in  the  complex  case.  In  large  samples,  i.e.,  as  n  -*■  <*> 

-2  log  Lq  -  X2([(p-q)(p-q+D/2]-l),  (or  X2  C  [  (p-q)  2-l  ] ) .  (3.1.2) 

As  for  the  estimation  of  q,  Zhao,  Krishnaiah  and  Bai  (1986a)  suggested 
a  new  information  theoretic  criterion  which  is  more  general  than  those  proposed 
by  Akaike  (1972),  Schwartz  (1978)  and  Rissanen  (1978).  Their  method  consists 

A 

in  choosing  as  an  estimate  of  q  the  number  q  such  that 


I(q,C  )  =  min{ I (0,C  ),..., I (p-1 ,C  )} 
n  n  n 


(3.1.3) 


where 


I(k,Cn)  =  -  log  Lk  +  Cn  v (k,p) 


v(k,p)  =  1  +  [k(2p-k+l) /2] , (or  1  +  k(2p-k)) 


which  is  the  number  of  free  parameters  when  rank  T  =  k,  and  C  are  such  that 

n 


lim  (C  /n)  =  0, 
n 

n-x° 


lim  (C  /loglogn)  =  ®. 
n-+® 


(3.1.4) 


3 
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A 

Zhao,  Krishnaiah  and  Bai  (1986a)  proved  that  q  determined  as  in  (3.1.3)  is  a 


strongly  consistent  estimate  of  q. 


3.2  Case  2  :  I  =  I,  a -1 


In  this  case,  the  likelihood  ratio  criterion  for  testing  the  hypothesis 


(3.5)  is  derived  by  Zhao,  Krishnaiah  and  Bai  (1986)  as 


log  L  =  s 


i=l+min(r ,q) 


(log  l±  +  1 -£i) 


(3.2.1) 


where  s  =  n/2  in  the  real  case  and  s=n  in  the  complex  case  and  t  is  the 


number  of  eigen  values  which  are  greater  than  unity.  The  large  sample 


distribution  of  -2  logL^  is  no  longer  x  •  But  as  suggested  by  Rao  (1983) 


in  a  slightly  different  situation  the  test  of  the  hypothesis  in  this  case 


can  be  carried  out  in  two  stages,  first  as  in  Case  3.1  taking  o  as  unknown, 


and  then  examining  whether  a  =1.  However,  the  criterion  of  Zhao,  Krishnaiah 


and  Bai  can  be  used  with  (3.2.1)  for  the  estimation  of  q,  i.e.,  by  minimizing 


-  log  Lk  +  Cn  v (k,p) 


(3.2.2) 


where  C  is  as  in  (3.1.4)  and 
n 


v(k,p)  =  [k(2p-k+l) / 2 ] ,  (or  k(2p-k)), 


(3.2.3) 


3.3  Case  3  :  E^  arbitrary  and  a  unknown 


In  addition  to  defined  in  (3.3),  we  suppose  that  another  independent 


variable  is  observable  and  has  the  distribution 


Sj  -  Wp(f,Z1),  (or  Sl  -  Wp(f,E)). 


(3.3.1) 


7 U'\  '  j 
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Let  £j>. .  •^■p  be  the  roots  of  the  equation  |n2*S2~  f-1sj  ■  0.  The  likelihood 
ratio  criterion  for  the  hypothesis  (3.5)  is  derived  by  Rao  (1983)  in  the  form 


“2 

p  n  l  +fa  n.+n 

2  log  L  =  -  s  log  n  [(  -± ~±—  ■■  )  — 1 - 

’  i-q+l  ”2+f  ”2  -2 £ 

*1  ° 


(3.3.2) 


where  s  =  1  in  the  real  case  and  s  =  2  in  the  complex  case,  and  cr  is  the 
root  of  the  equation 


(p-q)n2  p  n2mi 

f+n.  L  77~2‘ 

2  i=q+l  n2m^+fo 


(3.3.3) 


As  n^  and  f-«>,  the  statistic  (3.3.2)  is  distributed  as 


-  l) ,  (or  x2((p-q)2  -  1)) • 


(3.3.4) 


For  estimating  q,  the  method  of  Zhao,  Krishnaiah  and  Bai  is  to  minimize 


-  log  L  +  C  v(k,p) 
k  n 


(3.3.5) 


where  C  is  as  in  (3.1.4)  and 
n 


v(k,p)  =  -•(2£~--+1)'  +  1,  (or  k(2p-k)  +  1), 


'3.3.6) 


314  Case  4  :  is  arbitrary  and  o  =  1 

In  this  case  the  likelihood  ratio  criterion  is  derived  by  Zhao,  Krishnaiah 
and  Bai  (1986b)  as 

n~Z ,+f  n  +f  -n 

-  2  log  L  =  s  log  H  [(  ,v  )  t,  Z]  (3.4.1) 

q  i-l+min(q,x)  n2+f  1 
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L%\ 

,v, 


where  s  =  1  for  the  real  case  and  s  =  2  for  the  complex  case,  and  x  is  the  number 

of  which  are  greater  than  unity.  The  statistic  does  not  have  an  asymptotic 

2 

X  distribution  but  is  useful  in  the  estimation  of  q.  As  in  other  cases 
we  choose  q  to  minimize 

-  log  Lk  +  Cn  v(k,p)  (3.4.2) 

where  C  is  as  in  (3.1.4)  and 
n 

U(k,p)  =  — -2-P-~k+1)  ,  (or  k(2p-k) )  (3.4.3) 

The  estimates  of  q  obtained  in  (3.2.2),  (3.3.5)  and  (3.4.2)  are  strongly 
consistent  as  n2  and  f-*°°.  Detailed  proofs  are  given  in  two  papers  by  Zhao, 
Krishnaiah  and  Bai  ( 1986a, 1986b) . 


4.  EXPONENTIAL  SIGNAL  MODELS 


Let  y^  =  £t  +  n  ,  t  =  l,...,n,  be  observations  taken  at  equal  inter¬ 
vals  of  time  on  signals  corrupted  by  noise.  The  signal  is  considered 
to  be  of  the  form 

s  t  s  t 

£=ae  +..,+aem  (4.1) 

t  1  m 

where  a'  =  (a.,..., a  )  and  s'  =  (s,,...,s  )  are  unknown  complex  vector 
1  m  1  m 

parameters.  Often  the  value  of  m,  the  number  of  signals  (exponential  terms 

in  (4.1)),  is  itself  unknown  and  has  to  be  considered  as  an  essential 

parameter  under  estimation.  The  noise  components  are  taken  to  be  indepen- 

2 

dently  distributed  with  mean  zero  and  common  variance  x  .  The  problem  has 
a  long  history  starting  with  the  pioneering  work  of  Prony  (1795)  two  hun¬ 
dred  years  ago.  In  a  series  of  papers  Tufts  and  Kumaresan  (see  Kumaresan 


6 


(1982)  ,  Tufts  and  Kumaresan  (1982)  and  Kumaresan  and  Tufts  (1982)  and  the 
numerous  references  therein)  suggested  some  new  approaches  to  the  problem 
based  on  Prony's  parametrization  of  the  signal  process  described  below. 

Prony  (1795)  observed  that  Z,^  as  defined  in  (4.1)  satisfy  the  re¬ 
currence  relations 


Z  .  _  +  g ,  Z,  ,  + 

l+m-t-l  °1  l+m 


+  ^i  "  °- 


i=  1 , 


,n-m-l , 


(4.2) 


where  g'  =  (g  ,...,g  ,1)  is  a  function  of  s,  which  may  be  regarded  as  an 
alternative  parameter  to  s.  The  equations  (4.2)  lead  to  the  observational 
equations 

->Wl  =  Vi-Hn  +  •**  Vi’  1=1 . n~m-1  (4-3) 

which  can  be  written  in  the  matrix  form 


GY  =  0,  with  Y*  *  (yL . y  )  (4.4) 

choosing  the  (n-m-1)  x  n  matrix  G  appropriately  with  each  row  containing 
the  row  vector  g'  and  a  number  of  zeros,  the  position  of  g'  being  shifted 
by  one  element  when  we  go  from  one  row  to  the  next  row.  Much  of  the  pre¬ 
vious  work  is  centered  on  the  equations  (4.3,  4.4)  and  the  estimation  of 
g  by  minimizing 

Y*G*GY  or  Y  G*GY/g’g  (4.5) 


fixing  an  appropriate  value  for  m.  Once  g  is  estimated,  exp(s^)  are 
obtained  as  the  roots  of  the  polynomial  equation 


0  =  1  +  g:z 


+  8 


n=  1 


( 1 — z 


(4.6) 


.AY.  •' 
V  .Y.r-, 


m 
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and  a  is  estimated  by  minimizing 


taking 


(Y-xa) 


* (Y-xa) 


/  exp(s^) 
exp(s^n) 


exp (s  ) 
m 

exp (s  n)  / 
m  ' 


(A. 7) 


(A. 8) 


as  fixed.  Tufts  and  Kumaresan  in  the  papers  cited  above  make  some  re¬ 
finements  by  starting  with  a  larger  value,  say  i  >  m,  and  use  the  extra 
terms  to  reduce  the  noise  part  of  the  model.  However,  it  is  not  clear 
whether  minimizing  (A. 5)  ignoring  the  correlations  between  the  components 

of  GY  lead  to  consistent  estimators  of  the  unknown  parameters.  It  may  be 

2  * 

noted  that  the  covariance  matrix  of  GY  as  defined  in  (A. A)  is  a  GG  ,  in 
which  case  the  appropriate  quadratic  form  to  be  minimized  is 

Y*G*(GG*)“1GY  (A. 9) 


and  not  (A. A),  although  the  minimization  problem  associated  with  (A. 9)  is 
far  more  complicated. 

It  is  shown  (see  Smyth  (1985)  and  references  therein)  that  the 
estimates  of  a  and  s  obtained  from  (A. 6)  and  (A. 7)  using  the  g  estimated 
from  (A. 9)  are,  indeed,  maximum  likelihood  estimates,  i.e.,  those  obtained 
by  minimizing 

n  s.t  st 

V  | y  -  a.  e  -  ...  -  a  e  m  I  (A. 10) 

t=l  C  1 

with  respect  to  a^  and  s^ ,  under  the  assumption  that  the  error  terms  are 
normally  distributed.  We  can  also  look  upon  the  estimates  of  and 


obtained  by  minimizing  (4.10)  as  non-linear  least  squares  estimates  without 

any  distributional  assumptions.  Smyth  (1985)  has  developed  an  efficient 

algorithm  for  obtaining  g  which  minimizes  (4.9),  and  then  estimating  s  and 

a  through  the  steps  in  (4.6)  and  (4.7) ,  which  solves  the  non-linear  least 

squares  problem  (4.10)  for  given  m. 

The  choice  of  m  can  be  made  by  a  suitable  information  theoretic 

model  selection  criterion  such  as  the  one  used  in  the  previous  section. 

If  the  minimum  value  of  (4.10)  is  denoted  by  R  ,  then  under  the  assump- 

m 

tion  of  normality  of  q  ,  the  error  components,  the  information  theoretic 
criterion  takes  the  form 

n  log  R  +  C  (4m)  (4.11) 

m  n 

where  C  are  chosen  to  satisfy  the  conditions  (3.1.4).  The  AIC  criterion 
n 

(Akaike  (1972))  corresponds  to  the  choice  C  =*  1 ,  which  may  be  sufficient 
to  provide  a  fairly  accurate  estimate  of  m. 

However,  a  more  relevant  and  satisfactory  method  in  finite  samples 
is  provided  by  the  cross  validation  (CV)  approach,  although  the  computa¬ 
tions  may  be  extremely  heavy.  (See  for  instance  papers  by  Rao  (1984)  and 
Rao  and  Boudreau  (1985)  for  such  an  approach  in  a  prediction  problem.) 

In  the  CV  method  we  leave  one  of  the  values,  say  y.,  but  replace 
it  '■'y  a  variable  Y^.  For  any  choices  of  Y^  and  m,  using  Smyth's  algorithm, 
we  compute 

n  st  s  t 

R(Y  ,m)  =  min  '  \y  -  a  e  -  ...  -  a  e  m  r  (4.12) 

l  _  i  t  i  m 

a,s  t=l 

where  a'  =  (a.,...,a  )  and  s'  =  (s,,...,s  ).  Then  for  given  m,  bv  a 
1  m  1  '  m  • 

suitable  computer  program,  we  find  v,  such  that 


(4.13) 


R(y .  ,m)  »  min  R(Y .  ,m) 

Y  i 

i 


which  provides  as  an  estimate  of  Y^  for  given  m.  Then  comparing 

y,_  with  the  observed  y  ,  the  cross  validation  error  (CVE)  is  obtained 


R*(m)  =  c  (y.-y.  )  , 

iil  1 


(4.14) 


and  finally  m  is  chosen  as  that  value  for  which  R^(m)  is  a  minimum. 

As  observed  earlier,  the  computations  involved  in  the  above 
procedure  are  extremely  heavy.  However,  simplication  may  be  effected 
in  some  ways. 

1.  If  n  is  large,  we  may  choose  every  alternative  or  every  third 
value  among  the  components  of  (y^,...^  )  for  cross  validation.  This 
cuts  down  on  the  number  of  terms  in  (4.14)  and  reduces  the  computing 
time  considerably. 


2.  It  may  be  noted  that  y.  can  be  computed  by  an  alternative 

lm 


method  as 


m  „  s  .  i 


' .  =  l  a  .  < 

“  j-1  J 


(4.15) 


where  ,  s^.  ,  j  =  l,...,m,  are  the  values  minimizing  the  expression, 


i-1  ms.t  n  mst„ 

r  i  i  i  r  |  1 2 

L  iyt  -  ca  e  i  +  i  |y  -  ia  e  ,  . 

t=l  1  :  t=i+l  C  1  J 


(4.16) 


Due  to  the  absence  of  the  term  y  in  (4.16),  one  cannot  take  full  ad¬ 
vantage  of  Prony's  reparametrization .  But  if  the  optimum  values  a. 

and  s.  can  be  found  directly  through  some  other  algorithm,  then  y 

J  ^  im 

can  be  obtained  as  in  (4.15). 


3.  We  defined  R(Y^,m)  as  che  minimum  of  the  expression  on  the 
right  hand  side  of  (4.12).  For  purposes  of  estimating  m,  one  could  use 
an  approximation 


R(Y . ,m)  =  V  |  y  -  a  e 
1  t=l 


.  .  -  a  e 
m 


s  t 
m  i  2 


(4.17) 


where  a^  and  s.  are  estimates  obtained  by  methods  such  as  those  suggested 
by  Tufts  and  Kumaresan. 
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